Applied Generative AI for AI Developers
| Technique | Pros | Cons |
|---|---|---|
| Prompt Engineering | Fast, low cost, no retraining. | Limited accuracy, trial and error. |
| Fine-Tuning | Improved task performance. | Requires labeled data, costly. |
| cPT | Handles major domain shifts well. | Expensive, needs significant data. |
system: Sets the model’s tone and scope.user: Contains the main prompt or task.assistant: Used for context in iterative tasks.the Messages API allows you to interact with the model in a conversational way. You can define the role of the message and the content. The role can be either system, assistant, or user. The system role is used to provide context to the model, and the user role is used to ask questions or provide input to the model.
Users can get tailored responses for their use case using the following inference parameters while invoking foundation models:
temperature – Temperature is a value between 0–1, and it regulates the creativity of the model’s responses. Use a lower temperature if you want more deterministic responses, and use a higher temperature if you want more creative or different responses from the model.top_k – This is the number of most-likely candidates that the model considers for the next token. Choose a lower value to decrease the size of the pool and limit the options to more likely outputs. Choose a higher value to increase the size of the pool and allow the model to consider less likely outputs.top_p – Top-p is used to control the token choices made by the model during text generation. It works by considering only the most probable token options and ignoring the less probable ones, based on a specified probability threshold value (p). By setting the top-p value below 1.0, the model focuses on the most likely token choices, resulting in more stable and repetitive completions. This approach helps reduce the generation of unexpected or unlikely outputs, providing greater consistency and predictability in the generated text.stop sequences – This refers to the parameter to control the stopping sequence for the model’s response to a user query. For Meta-Llama models this value can either be “<|start_header_id|>”, “<|end_header_id|>”, or “<|eot_id|>”.| Model family | Prompt Engineering Reference |
|---|---|
| Amazon Nova | https://docs.aws.amazon.com/nova/latest/userguide/prompting.html |
| Anthropic Claude | https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/ |
| Meta LLaMA | https://www.llama.com/docs/how-to-guides/prompting/ |
import boto3
import json
# Initialize the Bedrock Runtime client
bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')
# Define messages using the Converse API (recommended for 2026)
messages = [
{
"role": "user",
"content": [{"text": "Summarize the following text: 'Generative AI is transforming industries by automating creative tasks.'"}]
}
]
# Call Claude via Bedrock Converse API
response = bedrock.converse(
modelId="anthropic.claude-sonnet-4-20250514-v1:0",
messages=messages,
system=[{"text": "You are a helpful assistant."}]
)
# Print the response
print(response["output"]["message"]["content"][0]["text"])messages = [
{
"role": "user",
"content": [{"text": """Examples:
Review: 'I love this product!' -> Positive
Review: 'This is the worst service ever.' -> Negative
Review: 'The delivery was on time.' ->"""}]
}
]
response = bedrock.converse(
modelId="anthropic.claude-sonnet-4-20250514-v1:0",
messages=messages,
system=[{"text": "You are a helpful assistant that classifies sentiment."}]
)
print(response["output"]["message"]["content"][0]["text"])Text to SQL prompt with Meta-LLaMA3
messages = [
{
"role": "system",
"content":
"""You are a mysql query expert whose output is a valid sql query.
Only use the following tables:
It has the following schemas:
<table_schemas>
{table_schemas}
<table_schemas>
Always combine the database name and table name to build your queries. You must identify these two values before proving a valid SQL query.
Please construct a valid SQL statement to answer the following the question, return only the mysql query in between <sql></sql>.
"""
},
{
"role": "user",
"content": "{question}"
}]Extract the relevant information from the following parahrapgh and present it in a JSON format.
Michael Doe, a 45-year-old teacher from Boston, Massachusetts, is an avid reader and enjoys gardening during his spare time.
Example 1:
Paragraph: "John Doe is a 32-year-old software engineer from San Francisco, California. He enjoys hiking and playing guitar in his free time."
"employee": {
"fullname": "John Doe",
"city": "San Francisco",
"state": "California",
"occupation": "software engineer",
"hobbies": ["hiking", "playing guitar"],
"recentTravel": "not provided"
},
Example 2:
Paragraph: "Emily Jax, a 27-year-old marketing manager from New York City, loves traveling and trying new cuisines. She recently visited Paris and enjoyed the city's rich cultural heritage."
"employee": {
"fullname": "Emily Jax",
"city": "New York City",
"state": "New York",
"occupation": "marketing manager",
"hobbies": ["traveling", "trying new cuisines"],
"recentTravel": "Paris"
}This produces the following output
“Prompt engineering is what you do inside the context window. Context engineering is how you decide what fills the window.”
“Context engineering is the delicate art and science of filling the context window with just the right information for each step.”
Instead of:
With DSPy:
| Optimizer | Approach |
|---|---|
| MIPROv2 | Bayesian optimization over instruction space |
| COPRO | Coordinate ascent hill-climbing |
| SIMBA | Self-reflective improvement from failures |
| GEPA | Trajectory reflection and gap analysis |
<context>You are analyzing customer feedback</context>
<instructions>Classify sentiment and extract key themes</instructions>
<examples>...</examples>
<input>{{user_input}}</input>Respond in JSON format:
{"sentiment": "positive|negative|neutral", "themes": [...]}
Think step by step. Show your reasoning before the final answer.
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-2026-01-01",
response_format={"type": "json_object"},
messages=[
{"role": "system", "content": "Extract data as JSON with keys: name, email"},
{"role": "user", "content": "John Smith can be reached at john@example.com"}
]
)
# Guaranteed valid JSON: {"name": "John Smith", "email": "john@example.com"}import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[{
"name": "extract_contact",
"description": "Extract contact information",
"input_schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string", "format": "email"}
},
"required": ["name", "email"]
}
}],
tool_choice={"type": "tool", "name": "extract_contact"},
messages=[{"role": "user", "content": "Contact: John Smith, john@example.com"}]
)| Use Case | Function Example |
|---|---|
| Database queries | query_database(sql: str) |
| API calls | get_weather(city: str) |
| Calculations | calculate_mortgage(principal, rate, years) |
| File operations | read_file(path: str) |
import boto3
import json
bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')
tools = [{
"toolSpec": {
"name": "get_stock_price",
"description": "Get current stock price for a ticker symbol",
"inputSchema": {
"json": {
"type": "object",
"properties": {
"ticker": {"type": "string", "description": "Stock ticker symbol"}
},
"required": ["ticker"]
}
}
}
}]
response = bedrock.converse(
modelId="anthropic.claude-sonnet-4-20250514-v1:0",
messages=[{"role": "user", "content": [{"text": "What's Apple's stock price?"}]}],
toolConfig={"tools": tools}
)
# LLM returns: tool_use with {"ticker": "AAPL"}| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Claude Sonnet 4 | $3.00 | $15.00 |
| GPT-4o | $2.50 | $10.00 |
| Claude Opus 4 | $15.00 | $75.00 |
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a customer service agent for Acme Corp...[long instructions]...",
"cache_control": {"type": "ephemeral"} # Cache this block
}
],
messages=[{"role": "user", "content": "How do I return an item?"}]
)┌─────────────────────────────────────────┐
│ Cached (5000 tokens) - $0.30/M reads │
│ ┌─────────────────────────────────┐ │
│ │ System prompt + Rules │ │
│ │ Few-shot examples │ │
│ │ Tool definitions │ │
│ └─────────────────────────────────┘ │
├─────────────────────────────────────────┤
│ Dynamic (500 tokens) - Full price │
│ ┌─────────────────────────────────┐ │
│ │ User message │ │
│ │ Conversation history │ │
│ └─────────────────────────────────┘ │
└─────────────────────────────────────────┘
import pytest
from your_llm_client import generate_response
class TestSentimentPrompt:
"""Test suite for sentiment classification prompt"""
def test_positive_sentiment(self):
result = generate_response("I absolutely love this product!")
assert result["sentiment"] == "positive"
assert result["confidence"] > 0.8
def test_negative_sentiment(self):
result = generate_response("This is the worst experience ever")
assert result["sentiment"] == "negative"
def test_neutral_edge_case(self):
result = generate_response("The product arrived on Tuesday")
assert result["sentiment"] == "neutral"
def test_empty_input_handling(self):
result = generate_response("")
assert "error" in result or result["sentiment"] == "unknown"| Metric | What It Measures | When to Use |
|---|---|---|
| Exact Match | Output == expected | Classification, extraction |
| F1 Score | Precision + Recall | Multi-label tasks |
| BLEU/ROUGE | Text similarity | Summarization, translation |
| LLM-as-Judge | Quality rating by another LLM | Open-ended generation |
| Human Eval | Expert assessment | Final validation |
def validate_input(user_input: str) -> bool:
# Length limits
if len(user_input) > 10000:
return False
# Known injection patterns
injection_patterns = ["ignore previous", "disregard instructions", "system:"]
if any(pattern in user_input.lower() for pattern in injection_patterns):
return False
return Truefrom pydantic import BaseModel, validator
class SafeResponse(BaseModel):
answer: str
@validator('answer')
def no_pii(cls, v):
# Check for patterns like SSN, credit cards
if re.search(r'\d{3}-\d{2}-\d{4}', v):
raise ValueError("Response contains potential PII")
return v┌─────────────────────────────────────────┐
│ Layer 1: Input Validation │
│ - Length limits, format checks │
├─────────────────────────────────────────┤
│ Layer 2: Content Moderation │
│ - Pre-flight safety classification │
├─────────────────────────────────────────┤
│ Layer 3: System Prompt Hardening │
│ - Clear boundaries, role definitions │
├─────────────────────────────────────────┤
│ Layer 4: Model-Level Safety │
│ - Built-in model guardrails │
├─────────────────────────────────────────┤
│ Layer 5: Output Validation │
│ - Schema enforcement, PII detection │
├─────────────────────────────────────────┤
│ Layer 6: Human Review │
│ - High-stakes decisions flagged │
└─────────────────────────────────────────┘
| Model | Max Context | Effective Use |
|---|---|---|
| Claude Opus 4 | 200K tokens | ~150K reliable |
| Gemini 2.0 Pro | 2M tokens | ~1.5M reliable |
| GPT-4o | 128K tokens | ~100K reliable |
<critical_instructions>
[Most important rules here - model attends strongly]
</critical_instructions>
<context>
[Supporting documents, examples, background]
</context>
<task>
[Current request - model attends strongly]
</task>
| Factor | Long Context | RAG |
|---|---|---|
| Document size | < 100K tokens | > 100K tokens |
| Update frequency | Static/rare updates | Frequent updates |
| Precision needed | “Consider everything” | “Find the needle” |
| Cost sensitivity | Lower volume | Higher volume |
| Latency requirements | Flexible | Strict |
First, provide your answer.
Then, critically evaluate your answer for errors.
Finally, provide your corrected final answer.
| Tool | Type | Key Strength | Link |
|---|---|---|---|
| Cursor | Agentic IDE | Best-in-class UI/UX, ‘Composer’ mode. | cursor.com |
| Windsurf | Agentic IDE | Deep context ‘Flow’, ‘Cascade’ agent. | codeium.com/windsurf |
| Antigravity | Agentic IDE | Multi-agent orchestration, free for individuals. | antigravity.google |
| Claude Code | CLI / Plugin | Research-grade agentic capabilities. | anthropic.com |
| Aider | CLI Tool | Best for terminal users, git-aware. | aider.chat |
| Roo Code | Plugin | Open-source, highly configurable agent. | roocode.com |
Claude Code Architecture
Source: Boris Cherny on X
~/.claude/CLAUDE.md # Global (all projects)
~/repos/org/CLAUDE.md # Organization-wide
~/repos/org/project/CLAUDE.md # Project-specific (most common)
~/repos/org/project/src/CLAUDE.md # Subdirectory (additive)
Key Insight: Files are additive - subdirectory CLAUDE.md appends to parent, doesn’t override.
# Project Overview
This is a FastAPI e-commerce backend with Stripe integration.
# Tech Stack
- Python 3.12, FastAPI, SQLAlchemy 2.0
- PostgreSQL 15, Redis for caching
- pytest for testing
# Commands
- Run tests: `uv run pytest`
- Start dev server: `uv run uvicorn main:app --reload`
- Lint: `uv run ruff check --fix . && uv run ruff format .`
# Code Style
- Use Pydantic for all data validation
- Private functions start with underscore
- One parameter per line in function signatures
# Architecture Decisions
- All database queries go through repository pattern
- Use dependency injection for testability/init - Let Claude generate a starter, then refine# For detailed API documentation, see: docs/api/README.md
# For database schema details, see: docs/schema.mdTell Claude where to find information, not all the information itself.
.claude/skills/
├── deploy/
│ ├── SKILL.md # Skill definition
│ ├── deploy.sh # Supporting script
│ └── config.template # Template file
├── review-pr/
│ └── SKILL.md
└── database-migrate/
└── SKILL.md
---
invocation: explicit # Only when user calls /deploy
context: fork # Run in isolated subagent
agent: general-purpose # Which agent to use
---
# Deploy to Production
Follow these steps to deploy:
1. Run tests: `uv run pytest`
2. Build: `docker build -t app .`
3. Push: `docker push registry/app`
4. Deploy: `kubectl apply -f k8s/`| Mode | Frontmatter | Behavior |
|---|---|---|
| Explicit | invocation: explicit |
Only via /skill-name command |
| Automatic | invocation: automatic |
Claude loads when relevant |
| Implicit | (default) | Available but not auto-loaded |
.claude/commands/fix-issue.md → /project:fix-issue
.claude/skills/fix-issue/SKILL.md → /fix-issue
Both create commands! Skills add: directories, frontmatter, subagent support.
In fix-github-issue.md:
┌─────────────────────────────────────────────────────────┐
│ Main Claude Session │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Your conversation history + CLAUDE.md context │ │
│ └───────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────┴────────────┐ │
│ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Subagent: Explore│ │ Subagent: Plan │ │
│ │ (Read-only) │ │ (Architecture) │ │
│ │ Own context │ │ Own context │ │
│ └────────┬─────────┘ └────────┬────────┘ │
│ │ │ │
│ └─────────┬───────────────┘ │
│ ▼ │
│ Results returned to main │
└─────────────────────────────────────────────────────────┘
| Agent | Purpose | Tools Available |
|---|---|---|
| Explore | Codebase exploration | Glob, Grep, Read (no Edit) |
| Plan | Architecture design | All read tools, no write |
| general-purpose | Full capabilities | All tools |
| Custom | Your definition | Configurable |
---
context: fork # Creates isolated subagent
agent: Explore # Use Explore agent type
---
# Find Authentication Code
Search the codebase for all authentication-related files.
Look for: login, logout, JWT, session, auth middleware.┌─────────────────────────────────────────────────────────┐
│ Claude Code Events │
├─────────────────────────────────────────────────────────┤
│ │
│ SessionStart ──► User types prompt │
│ │ │
│ ▼ │
│ UserPromptSubmit ──► Claude processes │
│ │ │
│ ▼ │
│ PreToolUse ──► [HOOK: Validate/Modify] ──► Tool runs │
│ │ │
│ ▼ │
│ PostToolUse ──► [HOOK: Format/Log] ──► Continue │
│ │ │
│ ▼ │
│ Notification ──► [HOOK: Alert user] │
│ │ │
│ ▼ │
│ Stop ──► Agent completes response │
│ │
└─────────────────────────────────────────────────────────┘
| Hook | When | Exit 0 | Exit 1 | Exit 2 |
|---|---|---|---|---|
| PreToolUse | Before tool runs | Continue | Block + retry | Block + error |
| PostToolUse | After tool completes | Continue | - | Show error |
| Notification | On alerts | Continue | - | - |
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|MultiEdit|Write",
"hooks": [
{
"type": "command",
"command": "if [[ $FILE_PATH == *.py ]]; then ruff format $FILE_PATH; fi"
}
]
}
]
}
}{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [
{
"type": "command",
"command": "if echo $TOOL_INPUT | grep -q 'rm -rf'; then exit 1; fi"
}
]
}
]
}
}“Hooks are huge and critical for steering Claude in a complex enterprise repo. They are the deterministic ‘must-do’ rules that complement the ‘should-do’ suggestions in CLAUDE.md.”
| Feature | Description |
|---|---|
| Multi-Agent Orchestration | “Manager” view for parallel agent tasks |
| Artifacts | Rich outputs (screenshots, diffs, recordings) for verification |
| Three Modes | Agent-driven, Review-driven, Agent-assisted |
| Multi-Model | Gemini 3, Claude Sonnet/Opus 4, GPT-OSS-120b |
# macOS (Homebrew)
brew install --cask kiro-cli
# Linux (one-liner)
curl -fsSL https://kiro.dev/install.sh | sh
# Verify installation
kiro-cli --version# Login with AWS Builder ID, GitHub, or Google
kiro-cli auth login
# Start interactive chat
kiro-cli chat
# Select model (Auto recommended)
# Options: Auto (1x), claude-sonnet-4.5 (1.3x), claude-haiku-4.5 (0.4x)